This tutorial demonstrates how we can access and process Sentinel-2 data in the cloud using the R packages rstac [@rstac] and gdalcubes [@gdalcubes].
Two examples on the creation of composite images and more complex time series analysis will introduce important functions of both packages.
Other packages used in this tutorial include stars[@stars] and tmap[@tmap] for creating interactive maps, sf[@sf] for processing vector data, and colorspace[@colorspace] for visualizations with accessible colors.
library(sf)
library(rnaturalearth)
library(tidyverse)
library(downloader)
library(tmap)
library(gdalcubes)
country <- ne_states(country = 'germany',returnclass='sf')
hessen= country %>% filter(name == "Hessen")
# ---- Official geometry data of the municipal areas (Bundesamt für Geodäsie und Kartographie)
# https://gdz.bkg.bund.de/index.php/default/open-data/verwaltungsgebiete-1-250-000-mit-einwohnerzahlen-ebenen-stand-31-12-vg250-ew-ebenen-31-12.html
if (!file.exists(paste0(tutorialDir,"municipial.zip"))){
downloader::download(url ="https://daten.gdz.bkg.bund.de/produkte/vg/vg250-ew_ebenen_1231/aktuell/vg250-ew_12-31.tm32.shape.ebenen.zip", destfile = paste0(tutorialDir,"municipial.zip"))
# Entpackt werden nur die Gemeindegeometrien
unzip(zipfile = paste0(tutorialDir,"municipial.zip"),
files = c("vg250-ew_12-31.tm32.shape.ebenen/vg250-ew_ebenen_1231/VG250_GEM.shp",
"vg250-ew_12-31.tm32.shape.ebenen/vg250-ew_ebenen_1231/VG250_GEM.dbf",
"vg250-ew_12-31.tm32.shape.ebenen/vg250-ew_ebenen_1231/VG250_GEM.shx",
"vg250-ew_12-31.tm32.shape.ebenen/vg250-ew_ebenen_1231/VG250_GEM.prj"),
exdir = paste0(tutorialDir,"municipial/"),
junkpaths = TRUE)
}
municipal_sf = st_read(paste0(tutorialDir,"municipial/VG250_GEM.shp"))
## Reading layer `VG250_GEM' from data source
## `/home/creu/edu/geoinfo/data/tutorial/municipial/VG250_GEM.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 11130 features and 25 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101487
## Projected CRS: ETRS89 / UTM zone 32N (N-E)
We aim at generating a cloud-free composite image of our study area for June, 2018 and we use the rstac package to find suitable Sentinel-2 images. However, to use the bbox argument of the corresponding function stac_search() for spatial filtering, we first need to derive and transform the bounding box to latitude / longitude (WGS84) values, for which we use the st_bbox() and st_transform() functions. In addition we adapt the projection of the referencing vector objects.
hessen_3035 = st_transform(hessen,crs = 3035)
hessen_32632 = st_transform(hessen,crs = 32632)
bbox = st_bbox(hessen)
st_as_sfc(bbox) |>
st_transform("EPSG:4326") |>
st_bbox() -> bbox_wgs84
# Projektion der Geometriedaten von ETRS89 / UTM zone 32N (N-E) 3044 in ETRS89-extended / LAEA Europe 3035
municipal_3035 = st_transform(municipal_sf, 3035)
lahntal_3035 = municipal_3035 %>% filter(GEN == "Lahntal")
lahntal_4326 = st_transform(lahntal_3035,crs = 4326)
lahntal_32632 = st_transform(lahntal_3035,crs = 32632)
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(st_as_sfc(bbox)) + tm_polygons() + tm_shape(lahntal_3035) + tm_polygons()
Our study area (the main land area of the Netherlands) is given in a (poorly) digitized GeoPackage file NL.gpkg. We can create a simple interactive map using the sf and tmap packages by running:
rstacNow, we can specify our STAC-API endpoint, and post a STAC search request using the transformed bounding box, the datetime range, and the collection name “sentinel-s2-l2a-cogs”.
library(rstac)
s = stac("https://earth-search.aws.element84.com/v0")
items = s |>
stac_search(collections = "sentinel-s2-l2a-cogs",
bbox = c(bbox_wgs84["xmin"],bbox_wgs84["ymin"],
bbox_wgs84["xmax"],bbox_wgs84["ymax"]),
datetime = "2021-06-01/2021-07-30",
limit = 500) |>
post_request()
items
## ###STACItemCollection
## - matched feature(s): 237
## - features (237 item(s) / 0 not fetched):
## - S2B_32UMV_20210730_0_L2A
## - S2B_32UNV_20210730_0_L2A
## - S2B_32UMA_20210730_0_L2A
## - S2B_32UNA_20210730_0_L2A
## - S2B_32UNB_20210730_0_L2A
## - S2B_32UNC_20210730_0_L2A
## - S2A_32UMV_20210728_0_L2A
## - S2A_32UNV_20210728_0_L2A
## - S2A_32UMA_20210728_0_L2A
## - S2A_32UNA_20210728_0_L2A
## - ... with 227 more feature(s).
## - assets:
## thumbnail, overview, info, metadata, visual, B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12, AOT, WVP, SCL
## - other field(s):
## type, stac_version, stac_extensions, context, numberMatched, numberReturned, features, links
By default, the result of the used SPAC API contains only up to 10 items and we need to increase this value using the limit argument. Here, we got a list of 260 STAC items.
Looking at one of the items, we see:
names(items$features[[10]])
## [1] "type" "stac_version" "stac_extensions" "id"
## [5] "bbox" "geometry" "properties" "collection"
## [9] "assets" "links"
The assets element contains direct links to the image files for separate bands and the properties element contains a lot of useful metadata including cloud coverage, datetime, projection, and more:
items$features[[10]]$assets$B05
## $title
## [1] "Band 5"
##
## $type
## [1] "image/tiff; application=geotiff; profile=cloud-optimized"
##
## $roles
## [1] "data"
##
## $gsd
## [1] 20
##
## $`eo:bands`
## $`eo:bands`[[1]]
## $`eo:bands`[[1]]$name
## [1] "B05"
##
## $`eo:bands`[[1]]$center_wavelength
## [1] 0.7039
##
## $`eo:bands`[[1]]$full_width_half_max
## [1] 0.019
##
##
##
## $href
## [1] "https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/32/U/NA/2021/7/S2A_32UNA_20210728_0_L2A/B05.tif"
##
## $`proj:shape`
## [1] 5490 5490
##
## $`proj:transform`
## [1] 20 0 499980 0 -20 5600040 0 0 1
items$features[[10]]$properties$`eo:cloud_cover`
## [1] 99.69
Next, we load the gdalcubes package and use this list of features from rstac to create a gdalcubes image collection object with the stac_image_collection() function. Compared to using gdalcubes with imagery on local storage, this does not need to open and read metadata from all files because STAC items already contain all relevant metadata including datetime, spatial extent, and how files relate to bands. As a result, creation of an image collection from STAC is quite fast.
library(gdalcubes)
s2_collection = stac_image_collection(items$features)
s2_collection
## Image collection object, referencing 237 images with 18 bands
## Images:
## name left top bottom right
## 1 S2B_32UMV_20210730_0_L2A 8.409843 49.65263 48.66353 9.135200
## 2 S2B_32UNV_20210730_0_L2A 8.999737 49.65271 48.65539 10.520582
## 3 S2B_32UMA_20210730_0_L2A 8.752525 50.53915 49.56450 9.137718
## 4 S2B_32UNA_20210730_0_L2A 8.999734 50.55220 49.55481 10.549314
## 5 S2B_32UNB_20210730_0_L2A 9.106578 51.45007 50.45353 10.579529
## 6 S2B_32UNC_20210730_0_L2A 9.470607 52.34716 51.35264 10.611369
## datetime srs
## 1 2021-07-30T10:27:21 EPSG:32632
## 2 2021-07-30T10:27:18 EPSG:32632
## 3 2021-07-30T10:27:08 EPSG:32632
## 4 2021-07-30T10:27:03 EPSG:32632
## 5 2021-07-30T10:26:49 EPSG:32632
## 6 2021-07-30T10:26:34 EPSG:32632
## [ omitted 231 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 B01 0 1 237
## 2 B02 0 1 237
## 3 B03 0 1 237
## 4 B04 0 1 237
## 5 B05 0 1 237
## 6 B06 0 1 237
## 7 B07 0 1 237
## 8 B08 0 1 237
## 9 B09 0 1 237
## 10 B11 0 1 237
## 11 B12 0 1 237
## 12 B8A 0 1 237
## 13 overview:B02 0 1 237
## 14 overview:B03 0 1 237
## 15 overview:B04 0 1 237
## 16 visual:B02 0 1 237
## 17 visual:B03 0 1 237
## 18 visual:B04 0 1 237
However, this function takes some further useful arguments. First, we see that our image collection does not contain the SCL band that contains information on cloud and cloud shadow pixels. This band is ignored by default, because it is missing the eo:bands properties in the STAC API response. As an alternative to consider this band, we can specify asset names manually using the asset_names argument. Second, the result contains all images although there are some with almost no clear pixels. To reduce the number of images, we can provide a function as the property_filter argument. In this case the cloud cover is set to a value of maximum 20%. This function receives the properties element (a list) of a STAC item as argument and is expected to produce a single logical value, where an image is ignored if the function returns FALSE.
assets = c("B01","B02","B03","B04","B05","B06", "B07","B08","B8A","B09","B11","SCL")
s2_collection = stac_image_collection(items$features,
asset_names = assets,
property_filter = function(x) {x[["eo:cloud_cover"]] < 5})
## Warning in stac_image_collection(items$features, asset_names = assets,
## property_filter = function(x) {: STAC asset with name 'SCL' does not include
## eo:bands metadata and will be considered as a single band source
s2_collection
## Image collection object, referencing 32 images with 12 bands
## Images:
## name left top bottom right
## 1 S2B_32UNV_20210730_0_L2A 8.999737 49.65271 48.65539 10.520582
## 2 S2B_32UMV_20210723_0_L2A 7.614286 49.65263 48.65703 9.135200
## 3 S2A_32UMV_20210721_0_L2A 7.615671 49.64841 48.87795 8.001956
## 4 S2A_32UMA_20210721_0_L2A 7.588101 50.55096 49.55650 8.445594
## 5 S2A_32UMV_20210718_0_L2A 7.614286 49.65263 48.65703 9.135200
## 6 S2A_32UNV_20210718_0_L2A 8.999737 49.65271 48.66025 10.516642
## datetime srs
## 1 2021-07-30T10:27:18 EPSG:32632
## 2 2021-07-23T10:37:19 EPSG:32632
## 3 2021-07-21T10:47:13 EPSG:32632
## 4 2021-07-21T10:47:02 EPSG:32632
## 5 2021-07-18T10:37:20 EPSG:32632
## 6 2021-07-18T10:37:15 EPSG:32632
## [ omitted 26 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 B01 0 1 32
## 2 B02 0 1 32
## 3 B03 0 1 32
## 4 B04 0 1 32
## 5 B05 0 1 32
## 6 B06 0 1 32
## 7 B07 0 1 32
## 8 B08 0 1 32
## 9 B09 0 1 32
## 10 B11 0 1 32
## 11 B8A 0 1 32
## 12 SCL 0 1 32
As a result we get an image collection with 32 images and the SCL band.
The next step in gdalcubes is to specify the geometry of our target data cube, which is called the data cube view. The data cube view is independent from specific image collections and hence does not contain information on spectral bands. In the following code, we use the cube_view() function to create and specify a coarse resolution data cube with cell size 200m x 200m x 30 days, using the Lambert equal area projection for Europe:
v.hessen.overview = cube_view(srs="EPSG:32632",
dx=200, dy=200,
dt="P30D",
aggregation="median",
resampling = "average",
extent=list(t0 = "2021-06-01",
t1 = "2021-07-30",
left=st_bbox(hessen_32632)["xmin"]-1000,
right=st_bbox(hessen_32632)["xmax"]+1000,
top=st_bbox(hessen_32632)["ymax"] + 1000,
bottom=st_bbox(hessen_32632)["ymin"]-1000)
)
v.hessen.overview
## A data cube view object
##
## Dimensions:
## low high count pixel_size
## t 2021-06-01 2021-07-30 2 P30D
## y 5470447.05595912 5723847.05595912 1267 200
## x 411011.768552299 587211.768552299 881 200
##
## SRS: "EPSG:32632"
## Temporal aggregation method: "median"
## Spatial resampling method: "average"
The messages simply tell us that the extent of the data cube has been enlarged because there can’t be anything like partial pixels. Notice that the resampling and aggregation methods define how pixels will be resampled in space and how pixel values from different days within the same data cube cell will be aggregated while aligning images with the target cube geometry. Our data cube geometry has 1595 x 1387 x 1 pixels in space and time directions respectively.
Afterwards, we can combine our image collection and cube view and create, process, and plot our actual data cube. To ignore pixels that have been classified as clouds or cloud shadows in individual images, we first need to create a mask object that simply tells gdalcubes that corresponding pixels with values 3,8, or 9 (see here) in the SCL band will not contribute to the data cube values and ignored during the temporal aggregation step.
The raster_cube() function then takes the image collection, data cube view, and the mask, and creates a virtual data cube. Calling this function does not start any computations or data transfers but simply returns a proxy object, which knows what to do. The functions select_bands() and filter_geom() to subset spectral bands and crop a data cube by a polygon repsectively both take a data cube as an input and produce a proxy object (or virtual data cube) as a result. Calling plot() will eventually start all the computations and plot the result. Computations are multithreaded (we use up to 16 threads here) and no intermediate results of the operations are written to disk.
S2.mask = image_mask("SCL", values = c(3,8,9))
gdalcubes_options(threads = 8)
rgb = raster_cube(s2_collection, v.hessen.overview, S2.mask) |>
select_bands(c("B02", "B03", "B04")) |>
filter_geom(hessen_32632$geom) |>
plot(rgb = 3:1, zlim=c(0,1500))
If we are interested in a smaller area, at higher resolution, we can create a data cube with a different data cube view as in the following example.
v.lahntal = cube_view(view = v.hessen.overview,
dx=10,
dy=10,
extent=list(left=st_bbox(lahntal_32632)["xmin"],
right=st_bbox(lahntal_32632)["xmax"],
top=st_bbox(lahntal_32632)["ymax"],
bottom=st_bbox(lahntal_32632)["ymin"]
)
)
raster_cube(s2_collection, v.lahntal, S2.mask) |>
select_bands(c("B02", "B03", "B04")) |>
plot(rgb = 3:1, zlim=c(0,1500))
The gdalcubes package comes with some built-in operations to process data cubes. The following operations produce a derived data cube from one or more input data cubes.
| Operator | Description |
|---|---|
apply_pixel |
Apply arithmetic expressions on band values per pixel. |
fill_time |
Fill missing values by simple time series interpolation. |
filter_pixel |
Filter pixels based on logical expressions. |
filter_geom |
Filter pixels that do not intersect with a given input geometry |
join_bands |
Combine bands of two or more identically shaped input data cubes. |
reduce_space |
Apply a reducer function over time slices of a data cube. |
reduce_time |
Apply a reducer function over individual pixel time series. |
select_bands |
Select a subset of a data cube’s bands. |
window_time |
Apply a moving window reducer or kernel over individual pixel time series. |
There are some more functions for exporting data cubes as netCDF or (cloud-optimized) GeoTIFF files, to read data cubes from netCDF files, to compute summary statistics over polygons (zonal statistics), to query data cube values by irregular spatiotemporal points, and to create animations.
In the example below, we compute the normalized difference vegetation index (NDVI), leave out values with NDVI <= 0 and plot the example. A custom color palette from the colorspace package is used to use light yellow for lower and green for higher NDVI values.
library(colorspace)
ndvi.col = function(n) {
rev(sequential_hcl(n, "Green-Yellow"))
}
raster_cube(s2_collection, v.hessen.overview, S2.mask) %>%
select_bands(c("B04", "B08")) %>%
filter_geom(hessen_32632$geom) %>%
apply_pixel("(B08-B04)/(B08+B04)", "NDVI") %>%
filter_pixel("NDVI > 0") %>%
plot(key.pos = 1, zlim=c(0,1), col = ndvi.col)
We can see that some additional water areas with NDVI < 0 have been set to NA.
We can convert data cubes to stars objects and use tmap for interactive mapping:
library(stars)
library(tmap)
raster_cube(s2_collection, v.hessen.overview, S2.mask) |>
select_bands(c("B04", "B08")) |>
filter_geom(hessen$geom) |>
apply_pixel("(B08-B04)/(B08+B04)", "NDVI") |>
filter_pixel("NDVI > 0") |>
st_as_stars() |>
tm_shape() + tm_raster()
## stars object downsampled to 833 by 1199 cells. See tm_shape manual (argument raster.downsample)
## Tip: rasters can be shown as layers instead of facets by setting tm_facets(as.layers = TRUE).
This example has shown how satellite imagery can be accessed and analyzed STAC and gdalcubes. The analysis has been rather simple by creating cloud-free composite images for only two months. However, for the chosen example the data originated from more than 30 Sentinel-2 images. Only downloading all the data first would be a nightmare.
The above approach is straightforward and make it simple to run the analysis in different areas of interest. The checking of transferability of methods and their application may become much easier. The part two of the tutorial will focus on more complex time series processing for a smaller area.